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Abstract —The capacity region of the index coding prob¬ 
lem is characterized through the notion of confusion 
graph and its fractional chromatic number. Based on this 
multiletter characterization, several structural properties 
of the capacity region are established, some of which are 
already noted by Tahmasbi, Shahrasbi, and Gohari, but 
proved here with simple and more direct graph-theoretic 
arguments. In particular, the capacity region of a given 
index coding problem is shown to be simple functionals 
of the capacity regions of smaller subproblems when the 
interaction between the subproblems is none, one-way, or 
complete. 

I. Introduction 

Suppose that a sender wishes to communicate a tuple 
of n messages, x n = (xi,..., x n ), Xj £ {0,1}^, to 
their corresponding receivers using a shared noiseless 
channel. Receiver j £ [1 : n] := {1,2,..., n} has prior 
knowledge of a subset x(Aj) := {x*: i £ Aj}, Aj C 
[1 : n] \ {j}, of the messages and wishes to recover Xj. 
It is assumed that the sender is aware of A\,... ,A n . 
The goal is to minimize the amount of information that 
should be broadcast from the sender to the receivers so 
that every receiver can recover its desired message. 

Any instance of this problem, referred to collectively 
as the index coding problem , is fully specified by the 
side information sets A±,..., A n . Equivalently, it can 
be specified by a side information graph G with n 
nodes, in which a directed edge i —t j represents that 
receiver j has message i as side information, i.e., i £ Aj 
(see Fig. 1(a)). Thus, we often identify an index coding 
problem with its side information graph and simply write 
“index coding problem G.” 

A (ti,..., t n , r) index code is defined by 

• an encoder 4> ■ nr=i{0,1}*- ->• {0,ir that maps 
n-tuple of messages x n to an r-bit index and 

• n decoders ipj : {0, l} r x TIfceyi j {0U} tfc —> 
{0,1}A that maps the received index cj)(x n ) and the 
side information x(Aj) back to Xj for j £ [1: n]. 

Thus, for every x n £ n"=i{0> 1}*% 

il)j{(t>{x n ),x{Aj)) = Xj, j £ [1: n). 

A rate tuple (f?i,..., R n ) is said to be achievable for the 
index coding problem G if there exists a (£i,..., t n , r) 
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Fig. 1. (a) The graph representation for the index coding problem 

with A\ = {2,3}, A 2 = {1}, and As = {1,2}. (b) The confusion 
graph corresponding to the integer tuple (ti,t 2 ,ts) = (1,1,1). Each 
node is labeled with the a message tuple. 

index code such that 

Rj < f -A, j £ [1 : n]. 

The capacity region & of the index coding problem is 
defined as the closure of the set of achievable rate tuples. 

Since Birk and Kol [1] introduced the index coding 
problem in 1998, this simple yet fundamental problem 
attracted several research communities (see [2]—[5] for a 
subset of recent contributions). The capacity region has 
been established for all 9,608 index coding problems of 
n = 5 messages [6] (which includes all index coding 
problems up to five messages by taking projections). 
However, the coding schemes developed for small n 
prove to be suboptimal when n becomes large and 
there is no known computable characterization for the 
capacity region of a general index coding problem. On 
the theoretical side, there is no known algorithm even 
to approximate the capacity region within a factor of 
0(n 1_e ) for e £ (0,1). On the computational side, the 






number of index coding problems blows up quickly in 
n (for example, there are 1,540,944 distinct instances of 
index coding problems for n = 6 ) and it also becomes 
quite challenging to compare existing inner and outer 
bounds on the capacity region of each problem as n 
increases. 

As an intermediate step towards characterizing the 
capacity region (analytically, approximately, or numeri¬ 
cally), we study some structural properties of the capac¬ 
ity region. In particular, we show that the side informa¬ 
tion graph G can be partitioned into two vertex-induced 
subgraphs G± and G 2 , then the capacity region X’ of 
the index coding problem G can be characterized as a 
simple functional of the capacity regions r C-\ and ^2 of 
G\ and G 2 , respectively, provided that 

1) there is no edge between G\ and G 2 , or 

2) more generally, there is no edge from G 2 to Gi, 
or 

3) every node in G± is connected to every node in 
G 2 and vice versa. 

The immediate utility of these structural properties is that 
one can reduce the number of index coding problems that 
need to be studied. For example, we can check (details 
not shown) that 1,366,783 (89%) out of 1,540,944 index 
coding problems for n = 6 fall into one of the three 
aforementioned criteria or another simple case (Proposi¬ 
tion 4 in Section III), significantly narrowing the set of 
problems that are worth further investigation. 

We must note that the first two properties have been 
already established by Tahmasbi, Shahrasbi, and Gohari 
[7, Th. 2] using a somewhat convoluted argument based 
on joint typicality encoding and covering. In comparison, 
our approach is more direct and based on the defini¬ 
tion of the capacity region itself. As discussed more 
precisely in Section III, our starting point is a graph- 
theoretic characterization of the index coding capacity 
region using the notion of confusion graph. This notion 
was introduced by Alon, Hassidim, Lubetzky, Stav, and 
Weinstein [ 8 ], who characterized the optimal broadcast 
rate (the reciprocal of the symmetric capacity) using the 
chromatic number of the confusion graph. We generalize 
and tighten their approach by connecting the capacity 
region with the fractional chromatic number of the con¬ 
fusion graph. This allows us to utilize well-known results 
from fractional graph theory [9] such as the identities 
on fractional chromatic numbers for graph products (see 
Section II) to establish several structural properties of the 
capacity region. Our approach based on confusion graph 
and fractional chromatic number seems to be broadly 
applicable beyond these structural results. Although it 
is not presented here, a similar method generalizes and 
tightens the recent result by Mazumdar [10] on the 
duality between index coding and distributed storage. 

Throughout the paper, the base of logarithm is 2. 


II. Mathematical Preliminaries 
A. Confusion Graphs 

We generalize the notion of confusion graph, which 
was originally introduced in [ 8 ] for equal-length mes¬ 
sages. 

Given an index coding problem G, two tuples of 
n messages x n ,z n € n?=i{0, 1 }*« are said to be 
confusable at receiver j € [1 : n] if Xj ^ Zj and 
Xi = Zi for all i e Aj. We simply say x n and z n 
are confusable if they are confusable at some receiver j. 
Given an index coding problem G and a tuple of message 
lengths t = (£i,...,f n ), the confusion graph Tt(G) 
is an undirected graph with JlILi — vertices such 
that every vertex corresponds to a message tuple x n 
and two vertices are connected iff (if and only if) the 
corresponding message tuples are confusable. 

The confusion graph of the index coding problem 
with side information graph in Fig. 1(a) corresponding 
to ( 3 ) = (1,1,1) is depicted in Fig. 1(b). 


B. Graph Coloring 

A (vertex) coloring of an undirected graph F is a 
mapping that assigns a color to each vertex such that 
no two adjacent vertices share the same color. The 
chromatic number x(T) is the minimum number of 
colors such that a coloring of the graph exists. 

More generally, a 6 -fold coloring assigns a set of b 
colors to each vertex such that no two adjacent vertices 
share the same color. The 6 -fold chromatic number 
X (b) (F) is the minimum number of colors such that a 
6 -fold coloring exists. Th e fractional chromatic number 
of the graph is defined as 


X/O") = lim 

b—f 00 


x (b) ( r ) 

6 


inf 

b 


x (fc) (r) 
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where the limit exists since X^(r) is subadditive. 
Consequently, 

x/(r) < x(r). (i) 


Let X be the collection of all independent sets in Y (i.e., 
sets of vertices such that no two vertices are adjacent). 
The chromatic number and the fractional chromatic 
number are also characterized as the solution to the 
following optimization problem 

minimize E PS 

sex 

subject to E PS >1, j e [1 : n\. 

Sex-.jeS 

When the optimization variables ps, S CX, take integer 
values { 0 , 1 }, then the (integral) solution is the chromatic 
number. If this constraint is relaxed and ps € [0,1], then 
the (rational) solution is the fractional chromatic number 

[9]. 



III. Main Results 


C. Graph Products 

Generally speaking, a graph product is a binary opera¬ 
tion on two (undirected) graphs L i and 14 that produces 
a graph T on the Cartesian product of the original vertex 
sets with the edge set constructed from the original edge 
sets according to certain rules. In this section, we review 
a few definitions of graph products and their (fractional) 
chromatic numbers. In the following, V\ ~ v 2 denotes 
that there exists an edge between Mi and m 2 . The notation 
V(r) means the vertex set of a graph T. 

The disjunctive product T = I) *■ L 2 is defined as 
V (T) = F(T!) x U(T 2 ) and ( Ul ,u 2 ) ~ (v lt va) iff 

Ml ~ Vi or U 2 ~ M 2 . 

The fractional chromatic number of the disjunctive prod¬ 
uct is multiplicative. 

Lemma 1 (Scheinerman and Ullman [9, Cor. 3.4.2]). 

x/(r i * r 2 ) = x/(Ti)x/(r 2 ). 

Note that the chromatic number satisfies the following 
relationship [9, Prop. 3.4.4]: 

x(ri*r 2 )<x(Ti)x(r 2 ). ( 2 ) 

The chromatic and fractional chromatic numbers of the 
power of a graph scale in the same exponential rate. 

Lemma 2 (Scheinerman and Ullman [9, Cor. 3.4.3]). 
Let T k be the k-th power of T in disjunctive product. 
Then 

X/(T) = lim = inf 

k—foo * k V 

The lexicographic product T = 1) • 1 2 is defined as 
V(T) = V(. Tr) x V(T 2 ) and (« 1} « 2 ) - ( Vl ,v 2 ) iff 

Mi ~ Mi or (mi = Mi and m 2 ~ m 2 ). 

Note that the lexicographic product of graphs is not com¬ 
mutative. Nonetheless, its fractional chromatic number is 
still multiplicative. 

Lemma 3 (Scheinerman and Ullman [9, Cor. 3.4.5]). 

x/(Ti • r 2 ) = x/(ri)x/(r 2 ). 

Note that the chromatic number satisfies the following 
relationship [11, Th. 1]: 

x(ri *r 2 ) < x(Ti)x(r 2 ). (3) 

The Cartesian product G = Gi A G 2 is defined as 
V(T) = V'(Ti) x U(T 2 ) and (mi,m 2 ) - (mi,m 2 ) iff 

(mi = Mi and m 2 ~ u 2 ) or (m 2 = m 2 and Mi ~ Mi). 

This product does not increase the chromatic number. 
Lemma 4 (Sabidussi [12, Lemma 2.6]). 

X(Ti A T 2 ) = max{x(Ti),x(r 2 )}. 


A. Capacity Region via the Confusion Graph 

We first state a simple generalization of the result by 
Alon, et al. [8, Th. 1.1]. 


Proposition 1. A rate tuple (R \,..., R n ) is achievable 
for the index coding problem G iff there exists an integer 
tuple t = (ti,..., t n ) such that 

Rj - rio g (x(r t (G)))i 1 (4) 

Proof: Sufficiency (achievability). For a given tuple 
t = (ti,... ,t n ), consider a coloring of the vertices of 
the confusion graph T = Tt(G) with y(r) colors. This 
partitions the vertices of T into x(S) independent sets. 
Now by the definition of the confusion graph, no two 
message tuples in each independent set are confusable 
and therefore assigning an index to each independent 
set yields a valid index code. The total number of 
codewords of this index code is x(T), which requires 
r = [log(x(r))l bits to be broadcast. This proves the 
existence of a (ti,... ,t n , [log(x(Tt(G)))]) index code. 

Necessity (converse). Consider any (£i,... ,t n ,r) in¬ 
dex code, which assigns at most 2 r distinct indices to 
message tuples. By definition, all the message tuples 
mapped to an index form an independent set of the 
confusion graph V = Tt(G). Moreover, every message 
tuple is mapped to some index so that these indepen¬ 
dent sets partition V(T). Thus, x(T) < 2 r , or equiv¬ 
alently, r > [log(x(r))"|. Therefore, any achievable 
(Ri,..., R n ) must satisfy 

Rl - fiogfa(r,(G)))l ■ i£[1:n1 ’ 

for some t = (fi,..., t n ). ■ 

The ceiling operation in (4), which results from the 
fact that the index is communicated in bits, is not es¬ 
sential. By using the code that maps x n £ n; =1 {o,i}^ 
to [1 : x(Ft(G))] repeatedly k times, one can easily 
construct a code that maps x n £ n"=i{o,i}^ to 
[1 : x(r t (G))] fc , thus achieving rates 

kt 

Rj fclog(x(Ft(G))) + 1 
kt ■ 

- rfcio g (x(r t (G)))r J e [1: n] ‘ (5) 

Letting k —> oo in (5) establishes the following. 


Proposition 2. The capacity region of the index 
coding problem G is the closure of all rate tuples 
(Ri, ■ ■ ■, R n ) such that 


for some 


Rj ~ io g (x(r t (G)))’ 
t = (t 1, . . . , in'). 


3 £ [1 : n], 


(6) 



We now state a stronger result, in terms of the frac¬ 
tional chromatic number, which will prove to be useful in 
establishing structural properties of the capacity region. 


Theorem 1. The capacity region 7? of the index coding 
problem G is the closure of all rate tuples (Ri, ■ ■ ., R n ) 
such that 


Rj ~ iog(x/(r t (G))) 

for some t = (t i,..., t n ). 


, j e [1 : n], 


(7) 


Proof: The necessity follows by (1) and Proposi¬ 
tion 1. 

Let e > 0. For each t = and the 

corresponding confusion graph Ft(G), Lemma 2 implies 
that there exists an integer k such that 

\J X( r t (G)) < X/(r t (G)) + e. (8) 

It can be also checked that the set of edges of (G) con¬ 
tains the set of edges of Ffct(G), which, when combined 
with (8), implies that \/x(^kt( G )) < X/( r t(G)) + e, 
or equivalently, 

_ h _<_Mz_ j e \ i • n ] 

log(X/(r t (G))+e) - log(x(r fet (G))) 1 3 1 ■ J - 

Thus, by Proposition 2, if (Ri ,..., R n ) satisfies 


Ri -log( Xf (Tt(G)) + ey 


then it must be in the capacity region. Since ‘rf is closed, 
taking e —>■ 0 completes the proof. ■ 


B. Capacity Region via Confusion Graph Products 

Throughout this subsection, we assume that G\ and 
G '2 are two vertex-induced subgraphs of G such that 
V^Gi) = [1 : m] and y(G 2 ) = [ni + 1 : n] partition 
V(G ) = [1 : n], We denote the capacity regions of the 
index coding problems G, Gi and G 2 by 7f, 7$) and 7^2, 
respectively. 

Proposition 3. If G has no edge between G\ and G 2 , 
then 

7 f = U {(aRi,(l-a)R 2 ): Ri G^1,R 2 6^2}. 

a€[0,t] 

In other words, the capacity region of G is achieved 
by time division between the optimal coding schemes 
for two disjoint subproblems Gi and G 2 . 

Proof: It suffices to show that 

7? C {(aRi, (1 — a)R 2 ): Ri G ( C\ , R 2 G ^ 2 }. 

a€[0,l] 

Let x n = (xi,x 2 ) and z n = (zi,z 2 ) be two message 
tuples, and t = (ti,t 2 ) be their common length tuple, 
where Xj, z», and t, correspond to the subproblem 


Gi, i = 1,2. By the definition of confusability, x" 
and z n are confusable iff they are confusable at some 
receiver j G V(Gi) or confusable at some receiver 
j G V(G 2 ). Since there is no edge between G\ and G 2 , 
these local confusability conditions are equivalent to the 
confusability of xi and zi for the subproblem G\ and 
the confusability of x 2 and z 2 for the subproblem G 2 , re¬ 
spectively. In other words, x n and z n are confusable for 
G iff xi and zi are confusable for G\ or x 2 and z 2 are 
confusable for G 2 . Thus, Ft(G) = Tt^Gi) * Tt 2 (G 2 ) 
and by Lemma 1 for disjunctive product. 


io g (x/(r t (G))) 

= iog(x/(r tl (G 1 ))) + iog( X /(r t2 (G 2 ))) =: h + l 2 . 


We now let a = l\/{l\ + Z 2 ) and apply Theorem 1. 
Before closure, any rate tuple in should satisfy 


Rj < 



a 7T> 

,(!-«) 


h ’ 


3 e V{G{), 
3 G V{G 2 ). 


But again by Theorem 1, (tj/h ■ j G V(Gi)) G C C\ and 
(tj/l 2 : j G V(G 2 )) G c to 2 -> which completes the proof. 


We now state a stronger version of Proposition 3, orig¬ 
inally established by Tahmasbi, Shahrasbi, and Gohari 
[7]; see also [5, Th. 8] for a related but much weaker 
statement. 

Theorem 2 (Tahmasbi, Shahrasbi, and Gohari [7, 
Th. 2]). If G has no edge from G 2 to G 1 , then 

V= U {(aRi,(l-a)R 2 ): Ri G«j,R 2 G^ 2 }. 

ae[0,i] 

Once again the capacity region is achieved by time 
division. Moreover, in light of Proposition 3 and the 
Farkas lemma [13, Th. 2.2] (that is, each edge in a 
directed graph either lies on a directed cycle or belongs 
to a directed cut but not both). Theorem 2 implies that 
removing edges of G that do not lie on a directed cycle 
does not change the capacity region. 

Proof: Assume without loss of generality that there 
exists an edge from every node in G\ to every node 
in G 2 . Now, since every node in G 2 has every node 
(message) in Gi as side information and no node in 
Gi has any node in G 2 as side information, x n and 
z n are confusable for G iff xi and zi are confusable 
for G 1 , or xi = Zi and x 2 and z 2 are confusable for 
G 2 (recall the notation in the proof of Proposition 3). 
Thus, Ft(G) = Ft x (Gi) • Tt 2 (G 2 ) and by Lemma 3 for 
lexicographic product, 

log(x/(Ft(G))) 

= iog(x/(r tl (Gi))) + iog(x/(r t2 (G 2 ))). 

The rest of the proof follows the identical steps to that 
of Proposition 3. ■ 



The only difference between Propposition 3 and The¬ 
orem 2 lies with which product of confusion subgraphs 
needs to be taken—the disjunctive product for two 
separate subproblems, while the lexicographic product 
for two subproblems dependent in only one direction. 
Note that the main tool we use from fractional graph 
theory fcf. Lemmas 1 and 3) is 

x/(r t (G)) > x/(rt 1 (G' 1 ))x/(rt 2 (G 2 )). (9) 

This implies that as long as an index coding problem 
can be partitioned into two subproblems and the corre¬ 
sponding (nonstandard) graph product of the confusion 
subgraphs satisfies (9), the capacity region has the same 
form as in Proposition 3 and Theorem 2. Also note that 
with the (integral) chromatic number, an inequality like 
(9) holds in the opposite direction (cf. (2) and (3)). 
This shows the major advantage of Theorem 1 over 
Proposition 2. 

Next, we consider index coding problems with side 
information graphs that contain a complete bipartite 
graph as an edge-induced subgraph. 

Theorem 3. If there are edges from every node in G\ 
to every node in G 2 and vice versa, then 


— {(Ri, R. 2 ) ■ Ri G c $\, R 2 G ^ 2 } • 


In other words, the capacity region of G is achieved 
by simultaneously using the optimal coding schemes for 
two disjoint subproblems G\ and G 2 . 

Proof: Since every node in Gi has every message 
in G 2 as side information and every node in G 2 has 
every message in Gi as side information, x n and z n 
are confusable for G iff xi = Zi and X 2 and Z 2 are 
confusable for G 2 , or X 2 = Z 2 and xi and zi are 
confusable for G\. Thus, T t (G) = T tl (Gi) A T t2 (G 2 ) 
and by Lemma 4 for cartesian product, 

x(Tt(G)) = max{x(r tl (G 1 )),x(r t2 (G 2 ))}. (10) 


By Proposition 2, before closure, any rate tuple in If 
should satisfy 


Rj ~ i og (x(r t (G))) : 


3 e [1 : n\, 


for some t = (t\,..., t n ). Combining this with (10), we 
have for i = 1, 2 


r> ^ 

Rj ~ logixFuiGi)) 


u: 


3 e V(Gi). 


By applying Proposition 2 once again, (tj/l 1 : j G 
V(Gi)) G 1o\ and (tj/l 2 : j G V(G 2 )) G ^ 2 , which 
completes the proof. ■ 


C. Capacity Region via Degraded Side Information Sets 

Here we consider the index coding problem G with 
side information sets A\,, A n . 

Proposition 4. If A, C A :r then removing i from Aj 
does not decrease the capacity region. 

The proof is intuitively clear. Given any index code, 
receiver j can first recover Xi using .4, and then uses 
Xi along with x(Aj \ {?'}) to recover Xj. Here is an 
alternative proof based on the notion of confusion graph. 

Proof: Assume that there exist x n and z n confus¬ 
able for the new problem G', but not for the original 
problem G. Then, they must be confusable at receiver j 
for G' (i.e., x 3 7 ^ Zj and x(Aj \ {*}) = z(Aj \ {?'})). 
Now if Xi = Zi, then it contradicts the assumption that 
x n and z n are not confusable (at receiver j ) for G. 
Alternatively, if Xi 7 ^ Zi, then since Aj C Aj and hence 
x(Ai) = z(Aj), it again contradicts the assumption that 
x n and z" are not confusable (at receiver 1) for G. 
Therefore, the confusion graphs must be the same and, 
by Theorem 1, so must be the capacity regions. ■ 
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